Search | WHO COVID-19 Research Database

1.

SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine learning method (preprint)

Adriano de Bernardi Schneider; Michelle Su; Angie S Hinrichs; Jade Wang; Helly Amin; John Bell; Debra A Wadford; Áine O'Toole; Emily Scher; Marc D Perry; Yatish Turakhia; Nicola De Maio; Scott Hughes; Russ Corbett-Detig.

biorxiv; 2023.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2023.05.26.542489

ABSTRACT

With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.

2.

Mutational spectra distinguish SARS-CoV-2 replication niches (preprint)

Chris Ruis; Thomas P Peacock; Luis Mariano Polo; Diego Masone; Maria Soledad Alvarez; Angie S Hinrichs; Yatish Turakhia; Cheng Ye; Jakob McBroome; Russ Corbett-Detig; Julian Parkhill; Rodrigo Andres Floto.

biorxiv; 2022.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2022.09.27.509649

ABSTRACT

Exposure to different mutagens leaves distinct mutational patterns that can allow prediction of pathogen replication niches (Ruis 2022). We therefore hypothesised that analysis of SARS-CoV-2 mutational spectra might show lineage-specific differences, dependant on the dominant site(s) of replication and onwards transmission, and could therefore rapidly infer virulence of emergent variants of concern (VOC; Konings 2021). Through mutational spectrum analysis, we found a significant reduction in G>T mutations in Omicron, which replicates in the upper respiratory tract (URT), compared to other lineages, which replicate in both upper and lower respiratory tracts (LRT). Mutational analysis of other viruses and bacteria indicates a robust, generalisable association of high G>T mutations with replication within the LRT. Monitoring G>T mutation rates over time, we found early separation of Omicron from Beta, Gamma and Delta, while the mutational burden in Alpha varied consistent with changes in transmission source as social restrictions were lifted. This supports the use of mutational spectra to infer niches of established and emergent pathogens.

3.

DecentTree: Scalable Neighbour-Joining for the Genomic Era (preprint)

Weiwen Wang; James Barbetti; Thomas Wong; Bryan Thornlow; Russ Corbett-Detig; Yatish Turakhia; Robert Lanfear; Bui Quang Minh.

biorxiv; 2022.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2022.04.10.487712

ABSTRACT

Summary Neighbour-Joining is one of the most widely used distance-based phylogenetic inference methods. However, current implementations do not scale well for datasets with more than 10,000 sequences. Given the increasing pace of generating new sequence data, particularly in outbreaks of emerging diseases, and the already enormous existing databases of sequence data for which NJ is a useful approach, new implementations of existing methods are warranted. Here we present DecentTree, which provides highly optimised and parallel implementations of Neighbour-Joining and several of its variants. DecentTree is designed as a stand-alone application and a header-only library easily integrated with other phylogenetic software (e.g. it is integral in the popular IQ-TREE software). We show that DecentTree shows similar or improved performance over existing software (BIONJ, Quicktree, FastME, and RapidNJ), especially for handling very large alignments. For example, DecentTree is up to 6-fold faster than the fastest existing Neighbour-Joining software (e.g. RapidNJ) when generating a tree of 64,000 SARS-CoV-2 genomes. Availability and implementation DecentTree is open source and freely available at https://github.com/iqtree/decenttree . Contact Minh Bui: m.bui@anu.edu.au ; Robert Lanfear: rob.lanfear@anu.edu.au Supplementary information Supplementary data are available at Bioinformatics online.

Subject(s)

Epilepsy, Frontal Lobe

4.

Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches (preprint)

Bryan Thornlow; Cheng Ye; Nicola De Maio; Jakob McBroome; Angie S Hinrichs; Robert Lanfear; Yatish Turakhia; Russ Corbett-Detig.

biorxiv; 2021.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2021.12.02.471004

ABSTRACT

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould. There are currently over 5 million sequenced SARS-CoV-2 genomes in public databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between Likelihood and Parsimony approaches to phylogenetic inference. Maximum Likelihood (ML) methods are more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare. Therefore, it may be that approaches based on Maximum Parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger datasets. Here, we evaluate the performance of de novo and online phylogenetic approaches, and ML and MP frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimizations produce more accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo, we therefore propose that, in the context of comprehensive genomic epidemiology of SARS-CoV-2, MP online phylogenetics approaches should be favored.

5.

Pandemic-Scale Phylogenomics Reveals Elevated Recombination Rates in the SARS-CoV-2 Spike Region (preprint)

Yatish Turakhia; Bryan Thornlow; Angie S Hinrichs; Jakob Mcbroome; Nicolas Ayala; Cheng Ye; Nicola De Maio; David Haussler; Rob Lanfear; Russ Corbett-Detig.

biorxiv; 2021.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2021.08.04.455157

ABSTRACT

Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral recombination. Low SARS-CoV-2 mutation rates make detecting recombination difficult. Here, we develop and apply a novel phylogenomic method to exhaustively search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages. We investigate a 1.6M sample tree, and identify 606 recombination events. Approximately 2.7% of sequenced SARS-CoV-2 genomes have recombinant ancestry. Recombination breakpoints occur disproportionately in the Spike protein region. Our method empowers comprehensive real time tracking of viral recombination during the SARS-CoV-2 pandemic and beyond.

Subject(s)

Severe Acute Respiratory Syndrome

6.

A new SARS-CoV-2 lineage that shares mutations with known Variants of Concern is rejected by automated sequence repository quality control (preprint)

Bryan Thornlow; Angie S Hinrichs; Miten Jain; Namrita Dhillon; Scott La; Joshua D Kapp; Ikenna Anigbogu; Molly Cassatt-Johnstone; Jakob Mcbroome; Maximilian Haeussler; Yatish Turakhia; Terren Chang; Hugh E Olsen; Jeremy Sanford; Michael Stone; Olena Vaske; Isabel Bjork; Mark Akeson; Beth Shapiro; David Haussler; A. Marm Kilpatrick; Russ Corbett-Detig.

biorxiv; 2021.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2021.04.05.438352

ABSTRACT

We report a SARS-CoV-2 lineage that shares N501Y, P681H, and other mutations with known variants of concern, such as B.1.1.7. This lineage, which we refer to as B.1.x (COG-UK sometimes references similar samples as B.1.324.1), is present in at least 20 states across the USA and in at least six countries. However, a large deletion causes the sequence to be automatically rejected from repositories, suggesting that the frequency of this new lineage is underestimated using public data. Recent dynamics based on 339 samples obtained in Santa Cruz County, CA, USA suggest that B.1.x may be increasing in frequency at a rate similar to that of B.1.1.7 in Southern California. At present the functional differences between this variant B.1.x and other circulating SARS-CoV-2 variants are unknown, and further studies on secondary attack rates, viral loads, immune evasion and/or disease severity are needed to determine if it poses a public health concern. Nonetheless, given what is known from well-studied circulating variants of concern, it seems unlikely that the lineage could pose larger concerns for human health than many already globally distributed lineages. Our work highlights a need for rapid turnaround time from sequence generation to submission and improved sequence quality control that removes submission bias. We identify promising paths toward this goal.

7.

A Tethered Ligand Assay to Probe the SARS-CoV-2 ACE2 Interaction under Constant Force (preprint)

Magnus S. Bauer; Sophia Gruber; Lukas F. Milles; Thomas Nicolaus; Leonard C. Schendel; Hermann E. Gaub; Jan Lipfert; Russ Corbett-Detig; Piet Maes; Dirk Daelemans; Michael J Buchmeier; Mohammed Bouziane; Anthony B Nesburn; Baruch D Kuppermann; Lbachir BenMohamed; Volkher Scharnhorst; Heidi Ammerlaan; Kathleen Deiteren; Stephan J.L. Bakker; Lucas Joost van Pelt; Yvette Kluiters-de Hingh; Mathie P.G. Leers; Andre van der Ven; Luciana C. Ribeiro; Marcus V. Agrela; Maria Luiza Moretti; Lucas I. Buscaratti; Fernanda Crunfli; Raissa . G Ludwig; Jaqueline A. Gerhardt; Renata Seste-Costa; Julia Forato; Mariene . R Amorin; Daniel A. T. Texeira; Pierina L. Parise; Matheus C. Martini; Karina Bispo-dos-Santos; Camila L. Simeoni; Fabiana Granja; Virginia C. Silvestrini; Eduardo B. de Oliveira; Vitor M. Faca; Murilo Carvalho; Bianca G. Castelucci; Alexandre B. Pereira; Lais D. Coimbra; Patricia B. Rodrigues; Arilson Bernardo S. P. Gomes; Fabricio B. Pereira; Leonilda M. B. Santos; Andrei C. Sposito; Robson F. Carvalho; Andre S. Vieira; Marco A. R. Vinolo; Andre Damasio; Licio A. Velloso; Helder I. Nakaya; Henrique Marques-Souza; Rafael E. Marques; Daniel Martins-de-Souza; Munir S. Skaf; Jose Luiz Proenca-Modena; Pedro M. Moraes-Vieira; Marcelo A. Mori; Alessandro S. Farias.

biorxiv; 2020.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.09.27.315796

ABSTRACT

The current COVID-19 pandemic has a devastating global impact and is caused by the SARS-CoV-2 virus. SARS-CoV-2 attaches to human host cells through interaction of its receptor binding domain (RBD) located on the viral Spike (S) glycoprotein with angiotensin converting enzyme-2 (ACE2) on the surface of host cells. RBD binding to ACE2 is a critical first step in SARS-CoV-2 infection. Viral attachment occurs in dynamic environments where forces act on the binding partners and multivalent interactions play central roles, creating an urgent need for assays that can quantitate SARS-CoV-2 interactions with ACE2 under mechanical load and in defined geometries. Here, we introduce a tethered ligand assay that comprises the RBD and the ACE2 ectodomain joined by a flexible peptide linker. Using specific molecular handles, we tether the fusion proteins between a functionalized flow cell surface and magnetic beads in magnetic tweezers. We observe repeated interactions of RBD and ACE2 under constant loads and can fully quantify the force dependence and kinetics of the binding interaction. Our results suggest that the SARS-CoV-2 ACE2 interaction has higher mechanical stability, a larger free energy of binding, and a lower off-rate than that of SARS-CoV-1, the causative agents of the 2002-2004 SARS outbreak. In the absence of force, the SARS-CoV-2 RBD rapidly (within [≤]1 ms) engages the ACE2 receptor if held in close proximity and remains bound to ACE2 for 400-800 s, much longer than what has been reported for other viruses engaging their cellular receptors. We anticipate that our assay will be a powerful tool investigate the roles of mutations in the RBD that might alter the infectivity of the virus and to test the modes of action of neutralizing antibodies and other agents designed to block RBD binding to ACE2 that are currently developed as potential COVID-19 therapeutics.

Subject(s)

COVID-19

8.

Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic (preprint)

Yatish Turakhia; Bryan Thornlow; Angie S Hinrichs; Nicola de Maio; Landen Gozashti; Robert Lanfear; David Haussler; Russ Corbett-Detig.

biorxiv; 2020.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.09.26.314971

ABSTRACT

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering a new era of "genomic contact tracing" - that is, using viral genome sequences to trace local transmission dynamics. However, because the viral phylogeny is already so large - and will undoubtedly grow many fold - placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient, tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach improves the speed of phylogenetic placement of new samples and data visualization by orders of magnitude, making it possible to complete the placements under real-time constraints. Our method also provides the key ingredient for maintaining a fully-updated reference phylogeny. We make these tools available to the research community through the UCSC SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for laboratories worldwide. Software AvailabilityUSHER is available to users through the UCSC Genome Browser at https://genome.ucsc.edu/cgi-bin/hgPhyloPlace. The source code and detailed instructions on how to compile and run UShER are available from https://github.com/yatisht/usher.

9.

Stability of SARS-CoV-2 Phylogenies (preprint)

Yatish Turakhia; Bryan Thornlow; Landen Gozashti; Angie Hinrichs; Jason Fernandes; David Haussler; Russ Corbett-Detig.

biorxiv; 2020.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.06.08.141127

ABSTRACT

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation and/or recombination among viral lineages. We suggest how samples can be screened and problematic mutations removed. We also develop tools for comparing and visualizing differences among phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse. ForewordWe wish to thank all groups that responded rapidly by producing these invaluable and essential sequence data. Their contributions have enabled an unprecedented, lightning-fast process of scientific discovery---truly an incredible benefit for humanity and for the scientific community. We emphasize that most lab groups with whom we associate specific suspicious alleles are also those who have produced the most sequence data at a time when it was urgently needed. We commend their efforts. We have already contacted each group and many have updated their sequences. Our goal with this work is not to highlight potential errors, but to understand the impacts of these and other kinds of highly recurrent mutations so as to identify commonalities among the suspicious examples that can improve sequence quality and analysis going forward.

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL